有了来自多个输入模式的信息,基于传感器融合的算法通常在机器人技术中表现出其单模式的表现。带有互补语义和深度信息的相机和激光镜头是复杂驾驶环境中检测任务的典型选择。但是,对于大多数摄像头融合算法,传感器套件的校准将极大地影响性能。更具体地说,检测算法通常需要多个传感器之间的准确几何关系作为输入,并且通常假定这些传感器的内容是同时捕获的。准备此类传感器套件涉及精心设计的校准钻机和准确的同步机制,并且制备过程通常是离线进行的。在这项工作中,提出了一个基于分割的框架,以共同估计摄像机套件校准中的几何和时间参数。首先将语义分割掩码应用于传感器模式,并通过像素双向损失优化校准参数。我们专门合并了来自光流的速度信息,以进行时间参数。由于仅在分割级别进行监督,因此在框架内不需要校准标签。提出的算法在KITTI数据集上进行了测试,结果显示了几何和时间参数的准确实时校准。
translated by 谷歌翻译
加密协议已被广泛用于保护用户的隐私,并避免暴露私人信息。 Quic(快速UDP Internet连接),包括由Google(GQUIC)最初设计的版本和IETF(IQUIC)标准化的版本,作为传统HTTP的替代品,演示了它们独特的传输特性:基于UDP进行加密资源传输,加速网页呈现。然而,基于TCP的现有加密传输方案容易受到网站指纹(WFP)攻击,允许通过窃听传输信道通过窃听用户访问的网站。无论是GQUIC和IQUIC是否可以有效抵制此类攻击值得调查。在本文中,我们从交通分析的角度研究了GQQUIC,IQUIC和HTTPS对WFP攻击的脆弱性。广泛的实验表明,在早期的交通方案中,GQQUIC是最容易受到原始攻击的攻击,而lequic比https更容易受到影响,但三个协议的脆弱性在正常的完全交通方案中类似。特征传输分析表明,在正常的完整流量方案时,大多数功能在协议之间可在协议之间传输。然而,与潜在特征表示的定性分析相结合,我们发现在早期流量时,转移效率低下,作为GQQ,IQUIC和HTTPS显示出在早期交通上的交通分布的显着不同的变化。通过将一次性WFP攻击升级到多个WFP顶级攻击,我们发现攻击准确性分别达到95.4%和95.5%,只有40个数据包,只需使用简单的功能,而且只达到60.7%什么时候在https。我们还证明IQUIC的漏洞仅略微依赖于网络环境。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
The node-place model has been widely used to classify and evaluate transit stations, which sheds light on individual travel behaviors and supports urban planning through effectively integrating land use and transportation development. This article adapts this model to investigate whether and how node, place, and mobility would be associated with the transmission risks and presences of the local COVID-19 cases in a city. Similar studies on the model and its relevance to COVID-19, according to our knowledge, have not been undertaken before. Moreover, the unique metric drawn from detailed visit history of the infected, i.e., the COVID-19 footprints, is proposed and exploited. This study then empirically uses the adapted model to examine the station-level factors affecting the local COVID-19 footprints. The model accounts for traditional measures of the node and place as well as actual human mobility patterns associated with the node and place. It finds that stations with high node, place, and human mobility indices normally have more COVID-19 footprints in proximity. A multivariate regression is fitted to see whether and to what degree different indices and indicators can predict the COVID-19 footprints. The results indicate that many of the place, node, and human mobility indicators significantly impact the concentration of COVID-19 footprints. These are useful for policy-makers to predict and monitor hotspots for COVID-19 and other pandemics transmission.
translated by 谷歌翻译
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
translated by 谷歌翻译
Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
translated by 谷歌翻译
Practical applications employing deep learning must guarantee inference quality. However, we found that the inference quality of state-of-the-art and state-of-the-practice in practical applications has a long tail distribution. In the real world, many tasks have strict requirements for the quality of deep learning inference, such as safety-critical and mission-critical tasks. The fluctuation of inference quality seriously affects its practical applications, and the quality at the tail may lead to severe consequences. State-of-the-art and state-of-the-practice with outstanding inference quality designed and trained under loose constraints still have poor inference quality under constraints with practical application significance. On the one hand, the neural network models must be deployed on complex systems with limited resources. On the other hand, safety-critical and mission-critical tasks need to meet more metric constraints while ensuring high inference quality. We coin a new term, ``tail quality,'' to characterize this essential requirement and challenge. We also propose a new metric, ``X-Critical-Quality,'' to measure the inference quality under certain constraints. This article reveals factors contributing to the failure of using state-of-the-art and state-of-the-practice algorithms and systems in real scenarios. Therefore, we call for establishing innovative methodologies and tools to tackle this enormous challenge.
translated by 谷歌翻译
Previous computation models either have equivalent abilities in representing all computations but fail to provide primitive operators for programming complex algorithms or lack generalized expression ability to represent newly-added computations. This article presents a unified computation model with generalized expression ability and a concise set of primitive operators for programming high-level algorithms. We propose a unified data abstraction -- Tensor of List, and offer a unified computation model based on Tensor of List, which we call the ToL model (in short, ToL). ToL introduces five atomic computations that can represent any elementary computation by finite composition, ensured with strict formal proof. Based on ToL, we design a pure-functional language -- ToLang. ToLang provides a concise set of primitive operators that can be used to program complex big data and AI algorithms. Our evaluations show ToL has generalized expression ability and a built-in performance indicator, born with a strictly defined computation metric -- elementary operation count (EOPs), consistent with FLOPs within a small error range.
translated by 谷歌翻译
Medical Visual Question Answering (Medical-VQA) aims to answer clinical questions regarding radiology images, assisting doctors with decision-making options. Nevertheless, current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces, which lead to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. Specifically, to learn an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy. Technically, the proposed strategy learns a constraint for the vision and texture encoders to be close in a same space, which is gradually loosened as the higher number of layers. Moreover, for grasping the semantic representation, we extend the unified Adversarial Masking data augmentation strategy to the contrastive representation learning of vision and text in a unified manner, alleviating the meaningless of the commonly used random mask. Concretely, while the encoder training minimizes the distance between the original feature and the masking feature, the adversarial masking model keeps adversarial learning to conversely maximize the distance. Furthermore, we also intuitively take a further exploration of the unified adversarial masking strategy, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE public benchmarks demonstrate that UnICLAM outperforms the existing 11 state-of-the-art Medical-VQA models. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis.
translated by 谷歌翻译
Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references. Due to the scarcity of human-labeled QE data, previous works attempted to utilize the abundant unlabeled parallel corpora to produce additional training data with pseudo labels. In this paper, we demonstrate a significant gap between parallel data and real QE data: for QE data, it is strictly guaranteed that the source side is original texts and the target side is translated (namely translationese). However, for parallel data, it is indiscriminate and the translationese may occur on either source or target side. We compare the impact of parallel data with different translation directions in QE data augmentation, and find that using the source-original part of parallel corpus consistently outperforms its target-original counterpart. Moreover, since the WMT corpus lacks direction information for each parallel sentence, we train a classifier to distinguish source- and target-original bitext, and carry out an analysis of their difference in both style and domain. Together, these findings suggest using source-original parallel data for QE data augmentation, which brings a relative improvement of up to 4.0% and 6.4% compared to undifferentiated data on sentence- and word-level QE tasks respectively.
translated by 谷歌翻译